he pattern discovery power

interesting to examine whether the word length (the k value of

r approach) matters. Based on the above two comparisons, the k

s varied from two to six to examine whether the accuracy of

t-free approach can be maintained. Figure 7.15 shows the result.

seen that when the word length increases, the correlation between

ment distance and the k-mer-distance decreases. This means that

r to use shorter words for using the alignment-free approach for

comparison so as to maintain the discovery power.

An examination whether the k-mer word length matters for the alignment-free

o maintain the accuracy in multiple sequence comparisons.

er machine

hether a word matrix (a k-mer frequency matrix) generated by an

t-free approach can show a good discrimination power needs to

ined. Therefore, ten SARS-CoV genome sequences and ten

oV-2 genome sequences were downloaded from NCBI. A word

ibrary) of the 3-mers for these genome sequences was derived.

16 shows a heatmap generated for this word matrix. It can be seen

n unsupervised machine learning model can make a good

n between two types of SARS genomes using the 3-mer word

y.